Increased TRIM31 gene expression is positively correlated with SARS-CoV-2 associated genes TMPRSS2 and TMPRSS4 in gastrointestinal cancers

Besides typical respiratory symptoms, COVID-19 patients also have gastrointestinal symptoms. Studies focusing on the gastrointestinal tumors derived from gastrointestinal tissues have raised a question whether these tumors might express higher levels of SARS-CoV-2 associated genes and therefore patients diagnosed with GI cancers may be more susceptible to the infection. In this study, we have analyzed the expression of SARS-CoV-2 associated genes and their co-expressions in gastrointestinal solid tumors, cancer cell lines and patient-derived organoids relative to their normal counterparts. Moreover, we have found increased co-expression of TMPRSS2-TMPRSS4 in gastrointestinal cancers suggesting that SARS-CoV-2 viral infection known to be mediated by this protease pair might facilitate the effects of viral infection in GI cancer patients. Further, our findings also demonstrate that TRIM31 expression is upregulated in gastrointestinal tumors, while the inhibition of TRIM31 significantly altered viral replication and viral processes associated with cellular pathways in gastrointestinal cancer samples. Taken together, these findings indicate that in addition to the co-expression of TMPRSS2-TMPRSS4 protease pair in GI cancers, TRIM31 expression is positively correlated with this pair and TRIM31 may play a role in providing an increased susceptibility in GI cancer patients to be infected with SARS-CoV-2 virus.

www.nature.com/scientificreports/ processes. This supports our hypothesis whereby TRIM31 may be linked to processes that mediate SARS-CoV-2 infection in GI cancers.

Results
Gene Set Enrichment Analysis-Molecular Signatures DataBase (GSEA-MSigDB) was utilized to obtain SARS-CoV-2 infection canonical pathway genes. The resulting genes are documented in Table 1. These genes were reported as responsible for the attachment of SARS-CoV-2 and its entry into cells. Initially we performed comparative gene expression analysis between normal gastrointestinal (GI) tissue samples. The analysis of gene expression changes between normal tissue samples revealed that CTSL expression in colon and esophagus mucosa, FURIN expression in pancreas, and TMPRSS2 expression in stomach was found as the highest among other genes (Fig. 1a). We next investigated the gene expression changes in GI tumor samples in comparison to their normal derivatives and observed significant changes in the expression of SARS-CoV-2-associated genes in various GI tumors (Fig. 1b). When tumor tissue samples were analyzed in comparison to normal samples, colon adenocarcinoma (COAD) and rectal adenocarcinoma (READ) samples exhibited significantly elevated expression of ACE2, SCARB1, TMPRSS2 and TMPRSS4 genes while pancreatic adenocarcinoma (PAAD) and stomach adenocarcinoma (STAD) samples demonstrated increased expression of ACE2, CTSL, NRP1, SCARB1 and TMPRSS4 genes relative to normal samples (Fig. 1b). Further, CTSL, FURIN, NRP1 and SCARB1 gene expression levels were significantly higher in esophageal carcinoma (ESCA) samples in comparison to their normal matches (Fig. 1b).  Figure S1j). After highly expressed genes were determined in GI solid tumor samples, a pairwise correlation analysis was performed to determine the relationship between these genes. Given some of the genes are co-expressed in tumor samples (Supplementary Figure S2), we reasoned that it may be possible to examine gene expression dependencies through their correlation analysis. After a pairwise correlation analysis of TCGA samples, ACE2-SCARB1, TMPRSS2-SLC6A19 and TMPRSS2-TMPRSS4 had a moderate positive correlation in COAD samples (Supplementary Figure S3a Figure S3). Taken together, these findings demonstrate that there is a positive correlation between the expression of a list of gene pairs specific to GI tumor types. Among these, TMPRSS2 and TMPRSS4 was commonly observed in all gastrointestinal tumor types.
Following a pairwise gene expression correlation analysis using GI tumor samples, the same analysis was performed using the CCLE database. By doing this, we aimed at determining whether there is a correlation between the SARS-CoV-2-associated genes similar to our observations in GI tumor types. Initial analysis was performed to eliminate cell lines exhibiting zero gene expression. Following this, the cell lines that have at least log 2 (TPM + 1) > 0 gene expression were included into the analysis. As a result, 63 colorectal cancer, 32 esophagus cancer, 41 pancreatic cancer and 29 stomach cancer cell lines were used (Supplementary Table S1). When we Table 1. Gene set that was found in SARS-CoV-2 entry into host cells. www.nature.com/scientificreports/ performed gene expression correlation analysis for TMPRSS2 and TMPRSS4 genes in GI cancer cell lines, we found a moderate and statistically significant positive correlation for this gene pair in colorectal and stomach cancer cell lines (Fig. 3a, b p-values < 0.05 and R-values 0.613 and 0.559, respectively). Similarly, we observed statistically significant positive correlation for the expression of TMPRSS2 and TMPRSS4 genes in pancreatic and esophagus cancer cell lines (Fig. 3c, d p-values < 0.05 and R-values 0.359 and 0.521, respectively). These observations indicate that the expression of TMPRSS2 and TMPRSS4 gene pair are positively correlated to GI cancer cell lines. This resembles our findings in the GI tumor samples. As TMPRSS2 and TMPRSS4 gene expression is positively correlated in both GI tumors and cell lines, we initially investigated whether there is an additional number of genes which are commonly associated with TMPRSS2 and TMPRSS4 genes in GI cancer cell lines. To address this question, we performed a correlation analysis in several GI cancer cell lines (colorectal, stomach, pancreatic, esophagus cancer cell lines) to identify a positively correlated gene(s) with TMPRSS2 and TMPRSS4. We observed TMPRSS2 and TMPRSS4 expression to be positively correlated with seven genes which are Solute Carrier  www.nature.com/scientificreports/ We next investigated the differential gene expression of these seven genes in GI solid tumors in comparison to their matched normal tissue samples. All GI tumor samples except STAD showed significantly elevated expression of SLC44A4 when compared to normal tissue samples (Supplementary Figure S4a). Moreover, TMEM45B expression was found to be significantly overexpressed in COAD, READ, STAD and PAAD except in ESCA samples (Supplementary Figure S4b). PRR15L showed highly increased expression in COAD and READ samples while RIPK3 was only significantly higher in PAAD samples relative to normal (Supplementary Figure S4c, Supplementary Figure S4d). In addition, no significant change or elevation in the expression of SH3BGRL2 was www.nature.com/scientificreports/ found in STAD and ESCA samples while it was significantly overexpressed in COAD, READ and PAAD samples (Fig. 4a). Furthermore, TJP3 gene expression was found to be highly overexpressed in COAD, READ and PAAD samples in comparison to their matched normal samples (Fig. 4b). As a result, three genes, SH3BGRL2, TJP3 and TRIM31, were positively correlated with TMPRSS2 and TMPRSS4 for all cancer cell lines and showed distinct differential expression profiles across GI solid tumors (Fig. 4). However, there was only one common gene that was both positively correlated with TMPRSS2 and TMPRSS4 in gastrointestinal cancer cell lines and significantly high in all gastrointestinal solid cancers, which was TRIM31 (Fig. 4c). There was also significant positive correlation of TRIM31 with TMPRSS2 and TMPRSS4 gene pair in all GI tumor samples (Supplementary Figure S5). After TRIM31 was found to be upregulated and positively correlated with TMPRSS2 and TMPRSS4 genes in both cancer cell lines and solid tumor samples, we aimed to investigate the relationship of TRIM31 with TMPRSS2 and TMPRSS4 using the clinically relevant model system Patient Derived Organoids (PDOs) 50,51 . RNAseq expression counts data from 87 colorectal cancer PDOs with their matched tumor samples were retrieved from the Gene Expression Omnibus (GEO) database (GSE171681). Our pair-wise gene expression correlation analysis revealed that TRIM31 gene expression in colorectal PDOs was positively correlated with TMPRSS2 and TMPRSS4 (Fig. 5a, b). Yet, there was no significant correlation observed between TRIM31 to TMPRSS2 and TMPRSS4 in pancreatic cancer PDOs, but a positive correlation was still observed between TMPRSS2 and TMPRSS4 (Fig. 5c, d). Taken together, these observations indicate that TRIM31 gene expression is positively correlated with TMPRSS2 and TMPRSS4 in GI cell lines, solid tumors and colorectal cancer PDOs.
This was followed by investigating the functional role of TRIM31 and its relationship cellular pathways using a genome-wide loss of screen approach. RNAi results for TRIM31 in colorectal cancer cell lines were obtained from the CCLE database utilizing DEMETER2 [52][53][54][55][56] . This model integrates large-scale pooled RNAi screening data expression changes. The relationship between expression correlation values and TRIM31 dependency  Table S3). Using the dataset, we performed a pathway enrichment analysis to assess TRIM31 knockdown effect in cellular processes in colorectal cancer cell lines. Initially we have determined upregulated and downregulated genes when TRIM31 was silenced using the RNAi approach (Fig. 6a). This analysis showed that number of genes downregulated genes (colored with blue circles) had bigger co-expression scores than upregulated genes (colored with red circles). This was then followed by the identification of related pathways when silencing of TRIM31 caused downregulation of www.nature.com/scientificreports/ genes marked with blue circles. This analysis demonstrated that the loss of TRIM31 affects viral transcription (GO:0019083) (p-Value = 2.36e−17) and viral processes (GO:0016032) (p-Value = 5.50e−08) with a false discovery rate; p-value < 0.05 (Fig. 6b). A list of downregulated genes interacting in 18 nodes are presented (Fig. 6c). Considering most of these genes are involved in viral transcription and viral processes, TRIM31 is likely to play a role in mediating viral transcription and viral processes in colorectal cancer. Due to the positive correlation of TRIM31 with TMPRSS2 and TMPRSS4 genes, it may therefore cause higher susceptibility of viral infections, such as SARS-CoV-2, seen in cancer patients.

Discussion
With   [59][60][61] . Not only normal GI tissues but also GI tumors may easily be infected by pathogens due to their distinct mucosal membrane protein expression profiles. We therefore characterized the expression profiles of a number of genes involved in SARS-CoV-2 infection pathway and found higher ACE2, TMPRSS2, TMPRSS4, SCARB1, CTSL and NRP1 expression in all GI tumor samples relative to their normal counterparts. Exceptions include ACE2, TMPRSS2, TMPRSS4 genes in ESCA samples; TMPRSS2 gene in STAD and PAAD samples; CTSL and NRP1 genes in COAD and READ samples; and FURIN gene in COAD, READ, STAD and PAAD samples. Among the significantly high expression profiles of receptors and mediator proteins for SARS-CoV-2 entry into cells in GI tumors, we observed high co-expression of TMPRSS2 and TMPRSS4 in all GI tumors demonstrating that these mucosa specific serine proteases can facilitate the entry of the spike protein into the cell membrane. Importantly, higher co-expression of TMPRSS2 and TMPRSS4 relative to normal was also observed in a human gastrointestinal cancer cell line dataset and colorectal cancer organoids. The latter especially holds importance due to its clinically accepted role as a relevant model system.
Given that the co-expression analysis with TMPRSS2-TMPRSS4 resulted in a positive correlation with TRIM31 in all GI samples, we further investigated the functional role of TRIM31 in GI tumors. TRIM31 is a member of E3 ubiquitin ligase family and mediates various cellular functions in gastrointestinal tissues including antiviral immune response 62 and tumorigenesis 63,64 . Despite the antiviral role of mitochondrial antiviral-signaling protein (MAVS), there have been various studies, recently shown in COVID-19 infection, demonstrating that MAVS complex can be limited to facilitate viral infection 65 . Amongst the role of inhibiting antiviral role of MAVS signalling, NEMO-like kinase was reported to induce the degradation of MAVS. In addition, PLK1 was reported to phosphorylate and destroy MAVS complex 66 . The dual role of MAVS complex can be context specific and especially in cancers, where immune dysregulation is commonly observed 67 , MAVS can be regulated differently than its antiviral function through the activation various phosphorylation events and hence limiting the antiviral function of MAVS complex. Another study also showed that antiviral immune response mediated by MAVS complex is opposed by SARS-CoV-2 nucleocapsid protein (NP), SARS-CoV-2-NP, to facilitate NP-mediated immune evasion in on colon cancer cell line CaCo-2 68 . Given the prominent consequences of COVID-19 resulting in immune deregulation and evasion 69 , antiviral immune response can be impeded and therefore viral infection can be promoted in cancer cells. Furthermore, another study particularly demonstrated that MAVS expression was downregulated in human colon cancer tissue samples 70 . Based on this study, we also checked MAVS expression in CRC samples and found that MAVS expression was significantly downregulated in CRC samples in comparison to normal samples (Supplementary Figure S6). Based on this finding and supporting studies 65,66 opposing the antiviral function of MAVS signalling, we think that decreased MAVS expression can impede antiviral immune response. In the case of viral infection, attenuated levels of MAVS may result in promoting SARS-CoV-2 viral infection and consequently promoting the viral evasion. As the MAVS expression is downregulated, TRIM31 may not perform its function as it normally mediates antiviral immune response via its interaction with the MAVS complex. TRIM31 has been previously reported to play a role in activation of NF-κB signalling in colorectal cancer 71 . In addition, NF-κB signalling was shown to be activated by viral infection 72 and www.nature.com/scientificreports/ play a role in COVID-19 pathogenicity [73][74][75] . In this process, TRIM31 may play a role in inducing the expression of NK-κB and therefore the activation of inflammation might contribute to the viral processes. To support the role of TRIM31 in regulation viral processes, in Fig. 6, we show that inhibition of TRIM31 in colon cancer cell lines results in downregulation of viral transcription/processes related cellular pathways. In addition, TMPRSS4 alongside with TMPRSS2 mediate SARS-CoV-2 viral entry in gut epithelial cells 76 and highly expressed in GI cancer types 77 . These two proteases facilitating viral entry are co-expressed with TRIM31 which can act as promoting viral transcription and viral processes related pathways. To support this notion, we have shown that MAVS normally interacting with TRIM31 to facilitate antiviral immune response is downregulated in CRC samples (Supplementary Figure S6) and therefore TRIM31 may not function as it is normally reported in antiviral immune response. Besides the expression of TMPRSS2, TMPRSS4, and TRIM31 in GI cancers, we have analysed the expression of these genes in all cancers, and we have found that the expression of these genes is more upregulated in majority of GI cancers than individual other cancer types (Supplementary Figure S7, Supplementary Figure S8, Supplementary Figure S9). It is important to note that we cannot rule out the possibility that other cancer types might also have increased susceptibility to COVID-19 infection. For example, other cancer types such as Lung Adenocarcinoma (LUAD) or Liver Hepatocellular Carcinoma (LIHC) may also exhibit increased susceptibility to COVID-19 infection via similar mechanisms governed by TMPRSS2, TMPRSS4, and TRIM31, however this remains to be further investigated.
TRIM31 RNAi expression profile obtained from 20 colorectal cancer cell lines using CCLE dataset provided evidence for TRIM31 dependency in these cell lines. Interaction network analysis performed following the TRIM31 knockdown in colorectal cancer cell lines highlighted a possible functional role of TRIM31 in viral replication and viral processes. This may therefore be important to develop strategies for targeting TRIM31 particularly in colorectal tumors. Studies showed that protease inhibitors such as against TMPRSS2, CTSL or other proteases may play a role in tackling SARS-CoV-2 viral entry into cells as these proteases are known to regulate viral infection [78][79][80] . For example, camostat-mesylate, a TMPRSS2 inhibitor, was found to be successful for the inhibition of viral entry into lung epithelial cells 27 . In the light of our findings where TRIM31 is positively correlated with TMPRSS2-TMPRSS4 in GI tumors and knockdown of TRIM31 mediates viral replication and viral processes, developing inhibitors targeting TRIM31 may decrease viral infection efficiency of SARS-CoV-2 in colorectal cancer patients. Taken together, our findings demonstrate that TMPRSS2-TMPRSS4-TRIM31 genes are positively correlated together in colorectal cancer which may facilitate SARS-CoV-2 entry into cells. Hence, targeting TRIM31 may play an important role in preventing SARS-CoV-2 viral infection in colorectal cancer patients.  (Table S1). To further investigate the expression correlations, two independent studies and three sets of RNA-seq data from gastrointestinal tumour organoid studies (GSE171680, GSE171681 51 and GSE139184 50 ) were used to identify gene expression correlations. Transcriptomic datasets of patient derived organoid models for only colorectal and pancreatic cancer were used since there was no available gastric and esophagus tumour organoid data and processed transcriptomic datasets available at the National Centre for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) under accession numbers: GSE171680, GSE171681 and GSE139184 (https:// www. ncbi. nlm. nih. gov/ geo). Genes selection and differential expression. The receptor and transmembrane proteins/genes that facilitate the entry of SARS-CoV-2 into cells used in this study were selected through Molecular Signature Database of the Gene Set Enrichment (GSEA-msigdb) repository (https:// www. gsea-msigdb. org/) query and the gene set in WP_SARSCOV2_AND_COVID19_PATHWAY was used 85 . For comparative and differential expression analyses, the gene list given in Table 1 was used for the analysis of TCGA datasets for solid tumor samples with their control matches and corresponding normal tissue samples in GTEx. Gene Expression Profiling Interactive Analysis portal associated with its python package was used to compare tumor samples with normal tissue 81 . The expression data were first log2(TPM + 1) transformed for differential analysis and the log 2 FC was identified as median difference between tumors and normal in this portal. Genes with higher |log 2 FC| values were considered differentially expressed genes. Gene expression values of GTEx samples (normal) were produced with RNA-SeQC v1.1.9 and built-in tool in their websites were used to generate a normal tissue heatmap 33 . Tissue-wise differential expression of one gene or a multi-gene signature in gastrointestinal cancer types (colon adenocarcinoma, COAD; esophagus carcinoma, ESCA; pancreatic adenocarcinoma, PAAD; rectal adenocarcinoma, READ; and stomach adenocarcinoma, STAD) was determined. For differential gene expression analysis for solid tumors of TCGA, log 2 (TPM) was used for logscale comparison and |log 2 FC| cut-off as 1 was applied. Moreover, some of the cell lines were excluded due to zero expression or no expression data available for corresponding genes. www.nature.com/scientificreports/ Gene set correlation analysis. A linear regression analysis (Pearson's correlation) was performed to explore the relationship between differentially expressed genes for each dataset. To define the relationship between differentially expressed genes, continuous variables were analyzed using Pearson's correlation analysis. For correlation analyses, the non-log scale was used for calculation and the log-scale axis was used for visualization in TCGA samples, in CCLE samples and in other datasets.

Loss-of-function screens and network analysis.
Dependency analysis for cancer cell lines was performed using RNAi dependency dataset (Achilles + DRIVE + Marcotte, DEMETER2). The resulting gene expression changes demonstrating only moderate or positive relationship were further investigated. Protein-protein interaction networks functional enrichment analysis was performed using STRING (https:// string-db. org) by all types of interactions with highest confidence interaction score (0.900). Disconnected nodes were excluded from visualization.
Statistical analysis. Pearson's correlation was calculated and concluded as strong when R value was larger than 0.7, moderate when R value was between 0.3 and 0.7 and weak when R value was between 0 and 0.3. Data were transformed into log 2 (TPM) from raw TPM values for pancreatic cancer organoids data and filtered according to related genes by using denoted packages with RStudio. Boxplots were generated using one-way ANOVA via GEPIA2 portal using disease status (Tumor and Normal) as variables for calculating differential expression and p-value < 0.05 was considered as significant for all differential gene expressions and correlation analyses. General data handling, filtering, and plotting was performed using RStudio (http:// www. rstud io. com) with installed packages such as dplyr, tidyverse and ggpubr [86][87][88] .